AITopics

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Neural Information Processing SystemsFeb-11-2026, 00:37:16 GMT

9c93b3cd3bc60c0fe7b0c2d74a2da966-Paper-Conference.pdf

attention head, full attention, sbm-transformer, (13 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(4 more...)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
(2 more...)

Neural Information Processing SystemsFeb-7-2026, 13:05:42 GMT

RemodelSelf

Inpractice,wealso need to consider the lower numerical precision of GPU implementation in model training, which further deteriorates the stability.

machine learning, natural language, urlhttp, (16 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 21:17:27 GMT

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications.

amm, training strategy, transformer, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Artificial IntelligenceMay-14-2025

SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models

Guo, Suhan, Deng, Jiahong, Yi, Mengjun, Shen, Furao, Zhao, Jian

Attention-based architectures have achieved superior performance in multivariate time series forecasting but are computationally expensive. Techniques such as patching and adaptive masking have been developed to reduce their sizes and latencies. In this work, we propose a structured pruning method, SPAT ($\textbf{S}$ensitivity $\textbf{P}$runer for $\textbf{At}$tention), which selectively removes redundant attention mechanisms and yields highly effective models. Different from previous approaches, SPAT aims to remove the entire attention module, which reduces the risk of overfitting and enables speed-up without demanding specialized hardware. We propose a dynamic sensitivity metric, $\textbf{S}$ensitivity $\textbf{E}$nhanced $\textbf{N}$ormalized $\textbf{D}$ispersion (SEND) that measures the importance of each attention module during the pre-training phase. Experiments on multivariate datasets demonstrate that SPAT-pruned models achieve reductions of 2.842% in MSE, 1.996% in MAE, and 35.274% in FLOPs. Furthermore, SPAT-pruned models outperform existing lightweight, Mamba-based and LLM-based SOTA methods in both standard and zero-shot inference, highlighting the importance of retaining only the most effective attention mechanisms. We have made our code publicly available https://anonymous.4open.science/r/SPAT-6042.

forecasting, large language model, machine learning, (19 more...)

2505.08768

Country: Asia > China (0.15)

Genre: Research Report (0.64)

Industry: Energy > Power Industry (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-31-2024

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Chen, Xinwang, Liu, Ning, Zhu, Yichen, Feng, Feifei, Tang, Jian

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications. To reduce the computation budget of transformer-based DPMs, this work proposes the Efficient Diffusion Transformer (EDT) framework. The framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. Our extensive experiments demonstrate the efficacy of EDT. The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a significant overall enhancement. With lower FID, EDT-S, EDT-B, and EDT-XL attained speed-ups of 3.93x, 2.84x, and 1.92x respectively in the training phase, and 2.29x, 2.29x, and 2.22x respectively in inference, compared to the corresponding sizes of MDTv2. The source code is released here.

amm, down-sampling module, transformer, (14 more...)

2410.23788

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Nguyen, Nghia Hieu, Quan, Tho Thanh, Nguyen, Ngan Luu-Thuy

ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering

arXiv.org Artificial IntelligenceOct-23-2024

Text-based VQA is a challenging task that requires machines to use scene texts in given images to yield the most appropriate answer for the given question. The main challenge of text-based VQA is exploiting the meaning and information from scene texts. Recent studies tackled this challenge by considering the spatial information of scene texts in images via embedding 2D coordinates of their bounding boxes. In this study, we follow the definition of meaning from linguistics to introduce a novel method that effectively exploits the information from scene texts written in Vietnamese. Experimental results show that our proposed method obtains state-of-the-art results on two large-scale Vietnamese Text-based VQA datasets. The implementation can be found at this link.

large language model, machine learning, question answering, (20 more...)

2410.14132

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.52)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

arXiv.org Artificial IntelligenceOct-4-2024

MARE: Multi-Aspect Rationale Extractor on Unsupervised Rationale Extraction

Jiang, Han, Duan, Junwen, Qu, Zhe, Wang, Jianxin

Unsupervised rationale extraction aims to extract text snippets to support model predictions without explicit rationale annotation. Researchers have made many efforts to solve this task. Previous works often encode each aspect independently, which may limit their ability to capture meaningful internal correlations between aspects. While there has been significant work on mitigating spurious correlations, our approach focuses on leveraging the beneficial internal correlations to improve multi-aspect rationale extraction. In this paper, we propose a Multi-Aspect Rationale Extractor (MARE) to explain and predict multiple aspects simultaneously. Concretely, we propose a Multi-Aspect Multi-Head Attention (MAMHA) mechanism based on hard deletion to encode multiple text chunks simultaneously. Furthermore, multiple special tokens are prepended in front of the text with each corresponding to one certain aspect. Finally, multi-task training is deployed to reduce the training overhead. Experimental results on two unsupervised rationale extraction benchmarks show that MARE achieves state-of-the-art performance. Ablation studies further demonstrate the effectiveness of our method. Our codes have been available at https://github.com/CSU-NLP-Group/MARE.

artificial intelligence, machine learning, natural language, (20 more...)

2410.03531

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceAug-23-2024

An alternative formulation of attention pooling function in translation

Conti, Eddie

The aim of this paper is to present an alternative formulation of the attention scoring function in translation tasks. Generally speaking, language is deeply structured, and this is reflected in the attention scoring matrix. We exploit this property to define the attention pooling function, taking this aspect into account. In the first chapters, we introduce the attention mechanism in mathematical terms and explain its limitations and alternative formulations. Next, we focus on the experimental session that led to the alternative formulation. Essentially, we guide queries and keys to interact in a specific manner, encoding the distinct roles of attention heads and directing values on where to seek context. In mathematical terms, we can think of this formula as projecting the attention scores matrix, say $H$, onto the space of band matrices with fixed bandwidth. This convex subspace is clearly finite-dimensional and therefore closed. As a consequence, the projection on this space is well-posed and unique. However, at the price of losing the uniqueness of the projection (i.e., the best approximation for $H$), we defined a new space consisting of band matrices plus error sparse matrices. We prove that this is a compact subspace which guarantees the existence of a matrix that best approximates $H$. We conclude the thesis by validating the new formula, namely calculating how well the new formula for attention scores approximates the original one. Additionally, we explore the impact of different parameters such as w (context windows) and num-pos (number of relevant words in a sentence). These analyses provide deeper insights into how languages are processed and translated, revealing nuances in the roles of context and word relevance.

alternative formulation, architecture, matrix, (14 more...)

2409.00068

Country:

North America > United States > Texas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)